Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary

نویسندگان

  • Xizeng Mao
  • Tao Cai
  • John G. Olyarchuk
  • Liping Wei
چکیده

MOTIVATION High-throughput technologies such as DNA sequencing and microarrays have created the need for automated annotation of large sets of genes, including whole genomes, and automated identification of pathways. Ontologies, such as the popular Gene Ontology (GO), provide a common controlled vocabulary for these types of automated analysis. Yet, while GO offers tremendous value, it also has certain limitations such as the lack of direct association with pathways. RESULTS We demonstrated the use of the KEGG Orthology (KO), part of the KEGG suite of resources, as an alternative controlled vocabulary for automated annotation and pathway identification. We developed a KO-Based Annotation System (KOBAS) that can automatically annotate a set of sequences with KO terms and identify both the most frequent and the statistically significantly enriched pathways. Results from both whole genome and microarray gene cluster annotations with KOBAS are comparable and complementary to known annotations. KOBAS is a freely available stand-alone Python program that can contribute significantly to genome annotation and microarray analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

KOBAS server: a web-based platform for automated annotation and pathway identification

There is an increasing need to automatically annotate a set of genes or proteins (from genome sequencing, DNA microarray analysis or protein 2D gel experiments) using controlled vocabularies and identify the pathways involved, especially the statistically enriched pathways. We have previously demonstrated the KEGG Orthology (KO) as an effective alternative controlled vocabulary and developed a ...

متن کامل

Testing the performance of automated annotation of ESTs with the Kegg Orthology (KO) database demonstrates lack of completeness of clusters.

The KEGG Orthology (KO) database was tested as a source for automated annotation of expressed sequence tags (ESTs). We used a control experiment where every EST was assigned to its cognate protein, and an annotation experiment where the ESTs were annotated by proteins from other organisms. Analyzing the results, we could assign classes to the annotation: correct, changed and speculated. The cor...

متن کامل

KAAS: an automatic genome annotation and pathway reconstruction server

The number of complete and draft genomes is rapidly growing in recent years, and it has become increasingly important to automate the identification of functional properties and biological roles of genes in these genomes. In the KEGG database, genes in complete genomes are annotated with the KEGG orthology (KO) identifiers, or the K numbers, based on the best hit information using Smith-Waterma...

متن کامل

KEGG as a reference resource for gene and protein annotation

KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, ...

متن کامل

KAAS: KEGG Automatic Annotation Server

The number of complete and draft genomes has rapidly increased in recent years, and it has become increasingly important to identify the functional properties and biological roles of genes in these genomes. We have been developing KEGG Orthology (KO) to classify gene functions. In KO, we annotate genes in complete genomes based on best-hit information using Smith-Waterman scores, as well as by ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 21 19  شماره 

صفحات  -

تاریخ انتشار 2005